Fix OTel cache-token parsing and add real-session regression tests#23
Merged
Conversation
The per-call LLM analysis read cache tokens only from underscore-style OTel attribute keys (gen_ai.usage.cache_read_input_tokens / gen_ai.usage.cache_creation_input_tokens), but the Copilot CLI emits the dotted form (gen_ai.usage.cache_read.input_tokens / gen_ai.usage.cache_creation.input_tokens). As a result per-call cache read/write were silently dropped and rendered empty in `analyze`, even though the session-level shutdown totals were correct. Add the dotted keys to the lookup lists so per-call cache tokens are parsed. Also add offline regression tests driven by four captured real Copilot CLI sessions (gpt-5.5, claude-opus-4.7, mai-code-1-flash-picker, gemini-3.1-pro-preview). Golden values are cross-checked against the raw session.shutdown payload, and the strongest invariant re-derives the AIU total and cache tokens from the independent OTel chat spans — which is what surfaced this bug. Sessions contain no secrets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
While double-checking the session-parsing implementation against real Copilot CLI runs (captured from a sibling experiment harness), the per-call LLM calls (OTel) table showed empty cache read/write columns — a real data-structure bug.
The bug
llm_calls_from_otel()looked up per-call cache tokens only via underscore-style OTel attribute keys (gen_ai.usage.cache_read_input_tokens,gen_ai.usage.cache_creation_input_tokens), but the Copilot CLI emits the dotted form:gen_ai.usage.cache_read.input_tokensgen_ai.usage.cache_creation.input_tokensSo per-call cache read/write were silently dropped (empty columns in
analyze). Session-levelsession.shutdowntotals were already correct, which is why aggregate AIU/token numbers reconciled and the bug hid from the synthetic fixtures.The fix
Add the dotted keys to the lookup lists in
analysis.py. Verified safe: the OTel per-call cache sums equal thesession.shutdowncache totals exactly.Regression tests on real data
Adds
tests/test_real_sessions.py+ 4 captured real sessions undertests/fixtures/real_sessions/:gpt-5.5claude-opus-4.7mai-code-1-flash-pickergemini-3.1-pro-previewThese are fully offline. Golden values are cross-checked against the raw
session.shutdownpayload, and the strongest invariant re-derives the AIU total and cache tokens from the independent OTelchatspans — exactly the check that surfaced this bug. Note providers differ: only Anthropic reportscache_creation(explicit cache writes); MAI/GPT/Gemini use implicit caching (cache_read only, no separate write charge).A 1-ULP rounding edge surfaced by the real
maidata also relaxed theaiu_by_typereconciliation assertion to a±1e-6tolerance.Testing
uv run ruff check .— cleanuv run pytest -q— all pass (incl. 32 new real-session assertions)Sessions contain no secrets (BYOK keys are never written to the event log).
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com